Time difference of 0.5097461 secs
Preparing for Production
When put into production code gets used more and on more data.
We will likely have to consider scalability of our methods in
Computation time
Memory requirements
When doing so we have to balance a trade-off between development costs and usage costs.
MCMC originally takes ~24 hours
Identifying and amending bottlenecks in code reduced this to ~24 minutes.
Is this actually better?
safe / stable / general / readable
trade for scalability
Sub-optimal optimisation can be worse than doing nothing
… programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimisation is the root of all evil (or at least most of it) in programming. - Donald Knuth
R as a stopwatch
Time difference of 0.5097461 secs
To diagnose scaling issues you have to understand what your code is doing.
Stop the code at time \(\tau\) and examine the call-stack.
Do this a lot and you can measure (estimate) the proportion of working memory (RAM) uses over time and the time spent evaluating each function.
Will get slightly different results each time you run the function
set.seed() to make a fair comparison over many runs.Function Source
Compiled Function
profvis() can similarly measure the memory usage of your code.
Use and write functions with vectorised inputs.
Be careful of recycling!
More on vectorising: Noam Ross Blog Post
Functional programming equivalent of a for loop. [apply(), mapply(), lapply(), …]
Apply a function to each element of a list-like object.
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Generalises functions from {matrixStats}
Iterate over a single object with map():
For more details and variants see Advanced R chapters 9-11 on functional programming.
{parallel} and {futures} allow parallel coding over multiple cores.
Powerful, but steep learning curve.
{furrr} makes this very easy, just add future_ to purrr verbs.
Need to be very careful handling RNG. See R-bloggers for more details.
Effective Data Science: Production - Scalability - Zak Varty